-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[libclc] Implement __clc_rsqrt with __ocml_rsqrt_* functions #152436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Motivation is to upstream use of __ocml_rsqrt_ in https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/amdgcn-amdhsa/math/rsqrt.cl llvm-diff shows vectorized calls of llvm.sqrt.v2f32 and fdiv are scalarized: Old: > %2 = tail call contract <2 x float> @llvm.sqrt.v2f32(<2 x float> %0), !fpmath !5 > %3 = fdiv contract <2 x float> splat (float 1.000000e+00), %2, !fpmath !4 !4 = !{float 2.500000e+00} !5 = !{float 3.000000e+00} New: < %2 = extractelement <2 x float> %0, i64 0 < %3 = tail call float @__ocml_rsqrt_f32(float noundef %2) < %4 = insertelement <2 x float> poison, float %3, i64 0 < %5 = extractelement <2 x float> %0, i64 1 < %6 = tail call float @__ocml_rsqrt_f32(float noundef %5) < %7 = insertelement <2 x float> %4, float %6, i64 1
@arsenm I'm not sure if the change is an improvement, please review. |
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions cl -- libclc/clc/lib/amdgcn/math/clc_rsqrt.cl View the diff from clang-format here.diff --git a/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl b/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
index 4a9ae94b7..69164fff7 100644
--- a/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
+++ b/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
@@ -10,13 +10,17 @@
float __ocml_rsqrt_f32(float);
-_CLC_OVERLOAD _CLC_DEF float __clc_rsqrt(float x) { return __ocml_rsqrt_f32(x); }
+_CLC_OVERLOAD _CLC_DEF float __clc_rsqrt(float x) {
+ return __ocml_rsqrt_f32(x);
+}
#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
double __ocml_rsqrt_f64(double);
-_CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) { return __ocml_rsqrt_f64(x); }
+_CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) {
+ return __ocml_rsqrt_f64(x);
+}
#endif
@@ -24,9 +28,7 @@ _CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) { return __ocml_rsqrt_f64(x)
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
half __ocml_rsqrt_f16(half);
-_CLC_OVERLOAD _CLC_DEF half __clc_rsqrt(half x) {
- return __ocml_rsqrt_f16(x);
-}
+_CLC_OVERLOAD _CLC_DEF half __clc_rsqrt(half x) { return __ocml_rsqrt_f16(x); }
#endif
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ocml rsqrt should be deleted, this is implementable by
float rsqrt(float x) {
#pragma clang fp contract(fast)
return 1.0f / __builtin_elementwise_sqrt(x);
}
thanks @arsenm. So current generic implementation is good: llvm-project/libclc/clc/lib/generic/math/clc_rsqrt.inc Lines 10 to 13 in eb0ddba
Close this PR. |
…#19722) Before this PR, libspirv-nvptx64--nvidiacl uses __nv_rsqrtf/isinf/copysign in these two functions, and libspirv-amdgcn--amdhsa uses __ocml_rsqrt/copysign. We can upstream use of target built-in to CLC library: Use of __nv_rsqrtf/isinf is to be upstreamed by llvm/llvm-project#150174. Then the changes to libspirv-nvptx64--nvidiacl should be minimized. Generic implementation of __clc_rsqrt is better than __ocml_rsqrt according to llvm/llvm-project#152436. Use of these target built-in scalarize llvm vector intrinsic. So it is likely better to use generic implementation. In addition, use of generic implementation aligns with OpenCL library. Delete libspirv/generic/math/floatn.inc. Delete unused clc/relational/floatn.inc and libspirv/generic/math/minmag.inc.
Motivation is to upstream use of _ocml_rsqrt in
https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/amdgcn-amdhsa/math/rsqrt.cl
llvm-diff shows vectorized calls of llvm.sqrt.v2f32 and fdiv are scalarized: Old:
> %2 = tail call contract <2 x float> @llvm.sqrt.v2f32(<2 x float> %0), !fpmath !5
> %3 = fdiv contract <2 x float> splat (float 1.000000e+00), %2, !fpmath !4
!4 = !{float 2.500000e+00}
!5 = !{float 3.000000e+00}
New:
< %2 = extractelement <2 x float> %0, i64 0
< %3 = tail call float @__ocml_rsqrt_f32(float noundef %2)
< %4 = insertelement <2 x float> poison, float %3, i64 0
< %5 = extractelement <2 x float> %0, i64 1
< %6 = tail call float @__ocml_rsqrt_f32(float noundef %5)
< %7 = insertelement <2 x float> %4, float %6, i64 1